Improving Genre Annotations for the Million Song Dataset

نویسنده

Hendrik Schreiber

چکیده

Any automatic music genre recognition (MGR) system must show its value in tests against a ground truth dataset. Recently, the public dataset most often used for this purpose has been proven problematic, because of mislabeling, duplications, and its relatively small size. Another dataset, the Million Song Dataset (MSD), a collection of features and metadata for one million tracks, unfortunately does not contain readily accessible genre labels. Therefore, multiple attempts have been made to add song-level genre annotations, which are required for supervised machine learning tasks. Thus far, the quality of these annotations has not been evaluated. In this paper we present a method for creating additional genre annotations for the MSD from databases, which contain multiple, crowd-sourced genre labels per song (Last.fm, beaTunes). Based on label co-occurrence rates, we derive taxonomies, which allow inference of toplevel genres. These are most often used in MGR systems. We then combine multiple datasets using majority voting. This both promises a more reliable ground truth and allows the evaluation of the newly generated and preexisting datasets. To facilitate further research, all derived genre annotations are publicly available on our website.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Listen To Me - Don't Listen To Me: What Communities of Critics Tell Us About Music

Social knowledge and data sharing on the Web takes many forms. So too do the ways people share ideas and opinions. In this paper we examine one such emerging form: the amateur critic. In particular, we examine genius.com, a website which allows its users to annotate and explain the meaning of segments of lyrics in music and other written works. We describe a novel dataset of approximately 700,0...

متن کامل

Music Genre Classification with the Million Song Dataset 15-826 Final Report

The field of Music Information Retrieval (MIR) draws from musicology, signal processing, and artificial intelligence. A long line of work addresses problems including: music understanding (extract the musically-meaningful information from audio waveforms), automatic music annotation (measuring song and artist similarity), and other problems. However, very little work has scaled to commercially ...

متن کامل

Lyrics-Based Music Genre Classification Using a Hierarchical Attention Network

Music genre classification, especially using lyrics alone, remains a challenging topic in Music Information Retrieval. In this study we apply recurrent neural network models to classify a large dataset of intact song lyrics. As lyrics exhibit a hierarchical layer structure—in which words combine to form lines, lines form segments, and segments form a complete song—we adapt a hierarchical attent...

متن کامل

Audio-based Music Classification with a Pretrained Convolutional Network

Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all a...

متن کامل

Adapting Metrics for Music Similarity Using Comparative Ratings

Understanding how we relate and compare pieces of music has been a topic of great interest in musicology as well as for business applications, such as music recommender systems. The way music is compared seems to vary among both individuals and cultures. Adapting a generic model to user ratings is useful for personalisation and can help to better understand such differences. This paper presents...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Improving Genre Annotations for the Million Song Dataset

نویسنده

چکیده

منابع مشابه

Listen To Me - Don't Listen To Me: What Communities of Critics Tell Us About Music

Music Genre Classification with the Million Song Dataset 15-826 Final Report

Lyrics-Based Music Genre Classification Using a Hierarchical Attention Network

Audio-based Music Classification with a Pretrained Convolutional Network

Adapting Metrics for Music Similarity Using Comparative Ratings

عنوان ژورنال:

اشتراک گذاری